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~5T; (Twice'S^ended) The peptide of claim 1, wherein the peptide contains at least one D- 
amino acid substiti^ion. 

,32. (Twice Aniended)N^e peptide of claim 1, wherein the peptide contains at least one 
variant amino acid substitutior\elected from the group comprising Ca-methylamino acids, Na- 
methylamino acids and a,P-unsaturhled amino acids. 

33. (Twice Amended) The peptide of^aim 1, wherein the peptide is cyclized. 



(Amended) A purified peptide comprisipg^fhe amino acid sequence of SEQ ID NO: 18. 



Pursuant to 37 CFR LJ2J(c)(I)(ii), a marked up version of the claims showing the 
changes made appears as Appendix B of this Amendment. 

In the Drawings: 

Please enter enclosed formal FIGS 1-16 in replacement for the formal drawings of record. 

Remarks 

Upon entry of the amendments made herein, claims 1-18, 31-34, 39, 43-44, 46-47, 49-55, 
and 57-61 are currently pending in the case. Amendments to the legends for FIGS. 2, 3, 4, 6, 10, 
1 1 and 13, on page 11, hne 21, through page 12, line 21, reflect the modified FIG. labels in the 
replacement formal drawings submitted herewith, as required under 37 CFR § 1.84(u)(l) and in 
the Office Action. Support for the amendment on page 48, line 7, of the specification can be 
found in the specification on line 78, hne 9. 

Claims 45, 48 and 56 have been cancelled without prejudice or disclaimer as they are 
drawn to a nonelected species. Claims 4 and 6 have been amended to remove recitation of "may 
be," with amendments supported by claim 1 as originally filed. Claims 12 and 61 have been 
amended to recite amino acid "sequences" rather than "residues", with amendments supported in 
the specification at least, e.g., at page 19, lines 11-12. Claims 18 and 31-33 have been amended 
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to correctly refer to a single peptide of claim 1, with amendments supported by claim 1 as 
originally filed. No new matter has been added. 

Applicants' election of the species encompassed by SEQ ID NO: 18 is noted in the Office 
Action, However, Applicants respectfully dispute the Examiner's assertion {see page 2, 
paragraph 2 of the Office Action) that no allowable generic or linking claims exist pursuant to 
the nonelected species. At the very least, SEQ ID N0:2 and SEQ ID NO: 1 8 are mouse and 
human homologs of the elected peptide of the invention, and the generic sequence of claim 4 
encompasses the same. Support for this characterization appears in the specification, e.g., at 
least on page 8, lines 9-13, and in FIG. 16. Applicants believe this interpretation to be correct, 
and their comments below are presented accordingly. 

Objections and rejections have been applied to the specification and pending claims. 
Each will be addressed in turn, below. 

Drawings: 

The Examiner has required that FIGS. 2, 3, 4, 6, 10, 1 1 and 13 be amended to meet the 
requirements under 37 C.F.R. §1.84(u)(l). Corrected final drawings are submitted herewith. 
Corresponding amendments have also been made to correct the "DESCRIPTION OF THE 
FIGURES" section. 

Objection to the Specification: 

The Examiner states that the specification was objected to because of an informality on 
page 48, line 7, characterized as needing a sequence identifier. The specification has been 
amended to contain the appropriate sequence identifier. Applicants respectfully submit that this 
objection is now moot and request that it be withdrawn. 

Rejections under 35 U.S,C. S112. first paragraph 

Claims 1-3, 7, 8-18, 31-34 and 39 were rejected by the Examiner under 35 U.S.C. §112, 
first paragraph, for encompassing purified leptin "homologs, analogs and derivatives," terms 
which the Office Action characterizes as not sufficiently described in the specification. 
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Applicants traverse this rejection as applied to the claims as now pending on the grounds that 
these are common terms known to those skilled in the chemical arts and are clearly given their 
ordinary meaning within the specification. 

Long-standing dictionary definitions are provided herewith for "homologue" (Exhibit 1), 
''homologous '^Exhibit 2\ "analog" (Exhibit 3), and "derivative" (Exhibit 4). Applicants note 
that these terms, given their ordinary meaning, reasonably convey the full scope of the subject 
matter of invention to one skilled in the art, and meet all written description requirements. In 
addition, specific examples of each variant type are disclosed and supported in the specification. 
Homologs are supported at least, e.g., on page 8, lines 9-13, disclosing SEQ ID N0S:2 and 18 as 
mouse and human variants. Analogs are described and supported at least, e.g., on page 56, line 
6, through page 57, line 9, disclosing cyclization of leptin peptides. Derivafives are described 
and supported at least, e.g., on page 55, line 14, through page 56, line 5, disclosing D-amino acid 
substituted leptin peptides. Specifically contemplated peptide designs are described at least on 
page 54, line 17, through page 58, line 4. Applicants respectfully submit that the specific and 
even general disclosure relating to the terms "homolog", "homologous", "analog", and 
"derivadve" is amply sufficient to meet the requirements of 35 U.S.C. §112, first paragraph. 
Applicants therefore respectfully submit that withdrawal of the rejection is proper and in order. 

Claims 1-4, 6-18, 31-34, 39, 45, 48, 56 and 61 were rejected under 35 U.S.C. §1 12, first 
paragraph, as allegedly lacking enablement. Specifically, the Examiner argues that the 
specification is enabling for a leptin fragment comprising the amino acid sequence of SEQ ED 
N0:2 or 18, but leptin peptides encompassing variations in % homology, amino acid 
substitutions, derivatives, etc., are not enabled. Applicants traverse this rejection as applied to 
the claims as now pending. 

As stated above, the specification on page 54, line 17, through page 58, line 4, clearly 
discloses cyclized leptin pepfides and D-amino acid subsdtuted lepfin peptides, among others. 
One of ordinary skill in the chemical art will clearly recognize that such a modified lepfin peptide 
is defined by the dicUonary definition of derivative, as either "a chemical substance related 
structurally to another substance and theoretically derivable from it" or as "a substance that can 
be made from another substance." (Exhibit 4) On page 55, lines 24-27, derivatized leptin 
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peptides are explicitly described where, e.g., "series of peptide analogs will be synthesized, each 
of which will contain a single D-amino acid which corresponds to its L-isomer in the native 
sequence." 

Determination of percent homology has been routinely done for decades, and is common 
knowledge for a person skilled in the art. See, e.g., Needleman and Wunsch (1970) J. Mol BioL 
48: 443-453 (Exhibit 5). As an example, a skilled artisan will know to compare the mouse and 
human peptide homologs of SEQ ID N0S:2 and 18, determine that three out of seven residues 
differ, and calculate that the remaining four out of seven residues compute to a percent homology 
of 57%. Simple mathematical modifications from this formula, widely practiced by those skilled 
in the art, are likewise used to determine the number of residues that can differ, e.g. in a seven 
residue peptide (SEQ ID N0S:2 and 18) or a fifteen residue peptide (SEQ ID NOS:3-10) that 
will correspond to a polypeptide with at least 70% homology, or greater than 85% or 95%, 
homology, to such peptide. 

As for the case law cited by the Examiner, the disclosures at issue within those cases are 
not appropriate to the present invention. For instance, the passage of in re Fisher is cited for the 
proposition that an inventor not be allowed to obtain broad claim scope in the absence of 
"reasonable correlation" to the scope of enablement in the specification. In the present case, 
Applicants have provided specific peptides with leptin activity, and have explicitly described 
methods of preparation, physical or chemical properties, and other characteristics sufficient to 
define a leptin peptide of the invention. See, e.g., the specification at least at page 54, line 17, 
through page 58, line 4. The reference to Amgen is similarly inapplicable, where the disclosure 
in Amgen lacked even a "mental picture of the structure of the chemical." See, Amgen Inc. v. 
Chugai Phar maceuticals Co. Ltd .. 18 U.S.P.Q.2d, 1016. Applicants' claimed invention requires, 
inter alia, leptin peptides having body mass modulating ability, while retaining strong homology 
to specifically recited peptide sequences. In addition, and as stated in the paragraph above, 
determination of homologs, analogs and derivatives of peptides of, or percent homology to, 
known sequence is well-known and commonly practiced in the art, and therefore is within the 
skill of those of ordinary skill in the art. For instance, at the time this application was filed, 
leptin peptides from many different species were already cloned and sequenced, and determined 
to be homologs across many species. See, e.g., Appendix C, wherein leptin clones publicly 
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disclosed prior to August 1999 are aligned and identified by GenBank accession number and by 
genus species designation. It therefore follows that reference to Clark is also inappropriate 
where the Examiner has specifically acknowledged disclosure by the Applicant of at least two 
species (murine and human) in the specification, and many additional species were provided by 
those skilled in the art at the time of the invention. See, e.g.. Appendix C. 

In summary, AppHcants believe reference to at least at page 54, line 17, through page 58, 
line 4, in the specification is sufficient enablement under 35 U.S.C. §112, first paragraph. 

Applicants submit that these 35 USC § 1 12, first paragraph, rejections are moot or 
overcome as described above, and request that these rejections be withdrawn. 

Rejections under 35 U.S.C. §112, second paragraph 

Claims 4, 6, 8-12, 18, 31-33, 45, 48, 56 and 61 were rejected under 35 U.S.C, §112, 
second paragraph as indefinite for reciting various phrases the Examiner found objectionable. 
Applicants believe these rejections are moot as applied to the claims as now pending. Each 
phrase rejection will be addressed in turn. 

Claims 4 and 6 were rejected for reciting "may be" instead of the more definite verb "is." 
Claims 4 and 6 have been amended as suggested by the Office Action. 

Claims 8-11 and 45 were rejected for recifing "mammalian", "murine", "human", and 
"synthetic" without distinguishing one from another. Claim 45 has been withdrawn from 
consideration. Applicants traverse the rejection as applied to claims 8-1 1 on the grounds that 
these are common terms known to those skilled in the chemical arts and are clearly given their 
ordinary meaning within the specification. 

Dictionary definitions, which also note the year each term began to be used in the English 
language, are provided herewith for "mammal" (Exhibit 6) as relating to "mammalian", "murine" 
(Exhibit 7), "human" (Exhibit 8), and "synthefic" (Exhibit 9). Applicants respectfully submit 
that these terms, given their ordinary meaning, particularly point out and distinctly claim the 
subject matter of the invention, as required by 35 U.S.C. §112. As a secondary matter, each of 
claims 8-1 1 depend separately from claim 1. In contrast to what is implied in the Office Acdon 
on page 8, last paragraph, these cited terms need not be mutually exclusive, so it is not necessary 
to distinguish one from the other, where a peptide can indeed be both human and synthefic. A 
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peptide that contains amino acid sequences common to both the murine and human peptides that 
meets the requirements of claim 1, an independent claim, is still encompassed within the 
invention without needing to fall within the narrower subsets of dependent claims. 

Claims 12 and 61 were rejected for reciting a peptide comprising amino acids (or amino 
acid residues), stating that the claims are confusing because it is not clear if the peptide has a 
specific amino acid sequence, or if the peptide just comprises the amino acid in the recited 
sequence in any given order. Claims 12 and 61 have been amended to more clearly distinguish 
the subject matter of the claims. This rejection is now moot as applied to the claims as amended. 

Claims 18, 31, 32 and 33 were rejected for reciting "any one of the peptides of claim 1" 
or similar language, where claim 1 is limited to a single peptide. Claims 18, 31, 32 and 33 have 
been amended to clarify the claim. This rejection is now moot as applied to the claims as 
amended. 

Claims 45, 48 and 56 are rejected for depending from a non-elected claim, where the 
claim was withdrawn for not reading on the elected species. Claims 45, 48 and 56 have been 
withdrawn, so the rejection is now moot. 

These 35 USC § 1 12, second paragraph, rejections are believed moot or inappropriate as 
applied to the claims as now pending. Applicants request that these rejections be withdrawn. 

Rejections under 35 U.S.C. §102(a) and 35 U.S.C. §102(b) over Grasso et al. 

Claims 1-4, 6-18, 39, 45 and 61 were rejected as anticipated by Grasso et al. {Endocrinol. 
138: 1413-1418, 1 997) ("Grasso"). Applicants traverse this rejection. 

The leptin fragments described in Grasso are identified in the specification as SEQ E) 
NOS: 11-16. The claims are drawn to polypeptides of SEQ ID NOS: 2-10 and 18, where SEQ ID 
NO: 18 is the elected peptide currently under consideration. It is settled law that a reference that 
does not describe each and every element of a claimed invention cannot be a bar to patentability 
under 35 U.S.C. §102. See, Hvbritech Inc. v. Monoclonal Antibodies. Inc. 802 F.2d 1367, 231 
USPQ 81 (Fed. Cir. 1986). At the very least, the 15 amino acid peptides in Grasso are more than 
twice the length of the 7 amino acid polypeptide of SEQ ID NO: 1 8, and Grasso does not provide 
the leptin polypeptide of SEQ ID NO: 18. Therefore, Grasso is not prior art to the present 
invention. Applicants request that these rejections be withdrawn. 
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Rejections under 35 U.S.C. §102(b) over Samson et al. 

Claims 1-3, 9-14, 16, 18, 33 and 39 were rejected as anticipated by Samson et al, 
(Endocrinol. 137(1 1): 5182-5185, 1996) ("Samson"). Applicants traverse this rejection. 

The three leptin fragments described in Samson are from 35 amino acids to 52 amino 
acids in length, in comparison to the 7 amino acid polypeptide of SEQ ID NO: 18. Since Samson 
et al. does not describe each and every element of the claimed invention, Apphcants request that 
the rejection be withdrawn. 

Rejections under 35 U.S.C. §102(b) over AI-Barazanji et al. 

Claims 1-4, 6-18, 39 and 61 were rejected as anticipated by Al-Barazanji et al. (PCX 
Publication WO 97/46585, published 12/1 1/97) ("Al-Barazanji"). Applicants believe this 
rejection is moot as applied to the claims as now pending. 

Al-Barazanji fails to described any leptin fragments claimed in the invention, and 
particularly does not disclose the polypeptide of SEQ JD N0:I8. Since Al-Barazanji does not 
describe each and every element of the claimed invention, Applicants request that the rejection 
be withdrawn. 
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Conclusion 

On the basis of the foregoing amendments, Applicants respectfully submit that the 
pending claims are in condition for allowance. If there are any questions regarding these 
amendments and remarks, the Examiner is encouraged to contact either of the undersigned at the 
telephone number provided below. The Commissioner is hereby authorized to charge any 
underpayments, or credit any overpayment of same, to Deposit Account No. 50-031 1 (Reference 
No. 19705-001). 

Respectfully submitted. 




Ivor R. Elrifi, Reg. No. 3^29 
Dated: February 19, 2002 Kristin E. Konzak, Reg. No. 44,848 

Attorneys/ Agents for Applicants 
MINTZ, LEVIN, COHN, FERRIS, 
GLOVSKY and POPEO, P.C. 
One Financial Center 
Boston, Massachusetts 02 1 1 1 
Tel: (617)542-6000 
Fax: (617) 542-2241 
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Appendix A: marked up ver sion of the specification showing the changes made 

In the Specification: 

Replace the paragraphs on page 1 1, lines 21-26, with the following paragraphs: 

[FIG. 2 is a graphic representation] FIGS. 2A-2B are graphic representations of the 
effects of synthetic leptin peptide OB-3 on body weight gain and food intake in genetically obese 
female C57BL/6J ob/ob mice. 

[FIG. 3 is a graphic representation] FIGS. 3A-3B are graphic representations of the 
effects of synthetic OB-3 on body weight gain and food intake in genetically obese female 
C57BLKS/J-m db/db mice. 

[FIG. 4 is a graphic representation] FIGS. 4A - 4N are graphic representations of the 
effects of 7 daily injections of various synthetic leptin peptides [(Panels A - N)] on body weight 
gain in female C57BL/6J ob/ob mice. 

Amend the paragraph on page 12, lines 3-4, as follows: 

[FIG. 6 is a graphic representation] FIGS. 6 A - 6D are graphic representations of the 
effects of 12 daily injections of various synthetic leptin peptides [(Panels A - D)] on body weight 
gain in female C57BL/6J ob/ob mice. 

Amend the paragraphs on page 12, lines 12-16, as follows: 

[FIG. 10 is a graphic representation] FIGS. IQA - IQB are graphic representations of the 
effects of 12 daily injections of LEP(1 16-130) synthetic peptide on body weight gain and food 
consumption in female db/db mice. 

[FIG. 1 1 is a graphic representation] FIGS. 1 1 A - 1 IB are graphic representations of the 
effects of 7 daily injections of various synthetic leptin peptide on body weight gain [(Panel A)] 
and food consumption [(Panel B)] in genetically obese female C57BLKS/J-m db/db mice. 
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Amend the paragraph on page 12, lines 19-21, as follows: 

[FIG. 13 is a graphic representation] FIGS. 13A - 13B are graphic representations of the 
effects of LEP(1 16-130) peptide on thermogenesis in genetically obese female C57BLKS/J-m 
db/db mice after 4 days [(Panel A)] and 7 days [(Panel B)] of peptide treatment.. 

On page 48, line 7, immediately before the word "was" please insert --(SEQ ID N0:2)--. 
On line 12, delete "waster" and insert --water--. 



In the Drawings: 

Delete previously filed FIGS 1-16 and replace them with the enclosed formal drawings 
FIGS 1-16. 
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In the Claims: 

Please cancel claims 45, 48 and 56 without prejudice or disclaimer. 
Please amend claims 4, 6, 12, 18, 31-33 and 61 as follows: 



--4. (Twice Amended) A leptin peptide having the amino acid sequence Xaan-Ser-Cys-Xaa,-Leu- 
Pro-Xaa2-Xaa3-Xaan, wherein: 

(a) either Xaan [may be] is zero or a contiguous stretch of at most seven peptide 
residues derived from SEQ ID NOS: 1 or 17; and 

(b) Xaa] , Xaa2 and Xaa3 [may be] is any amino acid substitution. -- 



(Twice Amended) The leptin peptide of claim 4, wherein: 

(a) Xaai [may be] is selected from the group consisting of His or Ser; 

(b) Xaa2 [may be] is selected from the group consisting of Trp or Gin; 

(c) Xaa3 [may be] js selected from the group consisting of Ala or Thr; or 

(d) the leptin peptide contains any combination of (a) or (b) or (c). 



"12. (Twice Amended) A peptide comprising an amino acid [residues] sequence of the leptin 
protein of anyone of SEQ ID N0S:1 and 17, selected from the group consisting of: 

(0 a sequence comprising amino [acid residues] acids 21-35 (SEQ ID N0:3); 

(ii) a sequence comprising amino [acid residues] acids 31-45 (SEQ ID N0:4); 

(iii) a sequence comprising amino [acid residues] acids 41-55 (SEQ ID N0:5); 

(iv) a sequence comprising amino [acid residues] acids 51-65 (SEQ ID N0:6); 
iy) a sequence comprising amino [acid residues] acids 61-75 (SEQ ID N0:7); 

(vi) a sequence comprising amino [acid residues] acids 71-85 (SEQ ID N0:8); 

(vii) a sequence comprising amino [acid residues] acids 81-95 (SEQ ID N0:9); 

(viii) a sequence comprising amino [acid residues] acids 91-105 (SEQ ID NO:10); 

(ix) a sequence comprising mouse amino [acid residues] acids 1 16-122 (SEQ ID 
N0:2); 
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M a sequence comprising human amino [acid residues] acids 11 6- 122 (SEQ ID 
NO: 18); 

and fragments, derivatives, homologs and analogs thereof. - 

-- 1 8. (Twice Amended) A [pharmaceutical] composition comprising [any one of the leptin 
peptides] the leptin peptide of claim 1, and a pharmaceutically acceptable carrier, 

--3 1 . (Twice Amended) [Any one of the peptides] The peptide of claim 1 , wherein the peptide 
contains at least one D-amino acid substitution. 

"32. (Twice Amended) [Any one of the peptides] The peptide of claim 1 , wherein the peptide 
contains at least one variant amino acid substitution selected from the group comprising Ca- 
methylamino acids, Na-methylamino acids and a,P-unsaturated amino acids. - 

--33. (Twice Amended) [Any one of the peptides] The peptide of claim 1, wherein the peptide 
is cyclized. ~ 

-6 1 . (Amended) A purified peptide comprising the amino acid [residues] sequence of SEQ ID 
N0:18. - 



15 



Applicants: Grasso et al. 
USSN: 09/6377,081 



Appendix C: ClustalW Alignment of Leptin Peptides Disclosed Before August 1999 



SKQ ID N0:1 
SEQ ID NO: 17 
AAC50730 Macaca mulatta 
AAB51033 Ovis aries 
AAB05923 Sus scrofa 
AAB06579 Bos taurus 
AAC48641 Sus scrofa 
AAB17091 Gorilla gorilla 
AAB17092 Pongo pygmaeus 
AAB4178 6 Ovis aries 
CAA72197 Bos taurus 
BAA19750 Bos taurus 
AAB53654 Canis familiaris 
AAB5402 3 Pan troglodyte 
153166 Homo sapiens 
LTHU Homo sapiens 
LTHS Mus musculus 
AAB61244 Bos taurus 
AAC603e8 Gallus gallus 
Q95189 Gorilla gorilla 
Q28504 Macaca mulatta 
Q95234 Pongo pygmaeus 
AAB9730B Sus scrofa 
AAC06303 Sus scrofa 
002750 Pan troglodytes 
Q28603 Ovis aries 
AAC32380 Gallus gallus 
AAC32381 Meleagris gallop 
1AX8 Homo sapiens 
Consensus 



SEQ ID N0:1 
SEQ ID NO: 17 
AAC50730 Macaca mulatta 
AAB51033 Ovis aries 
AAB05923 Sus scrofa 
AAB06579 Bos taurus 
AAC48641 Sus scrofa 
AAB17091 Gorilla gorilla 
AAB 17092 Pongo pygmaeus 
AAB41786 Ovis aries 
CAA72197 Bos taurus 
BAA19750 Bos taurus 
AAB53654 Canis familiaris 
AAB54023 Pan troglodyte 
153166 Homo sapiens 
LTHU Homo sapiens 
LTMS Mus musculus 
AAB61244 Bos taurus 
AAC60368 Gallus gallus 
Q95189 Gorilla gorilla 
Q28504 Macaca mulatta 
Q95234 Pongo pygmaeus 
AAB97308 Sus scrofa 
AAC06303 Sus scrofa 
002750 Pan troglodytes 
Q28603 Ovis aries 
AAC32380 Gallus gallus 
AAC32381 Meleagris gallop 
1AX8 Homo sapiens 
Consensus 



I 



10 



20 



MCWRPLCRFLWLWSYLSYV-A 
MHWGTLCGFLWLWPYLFYVQfl 
MYWRTLWGFLWLWPYLFYIQA 

MRCGPLCRFLWLWPYLSYVE. 



-YLSYVEGF 



MHWGTLCGF LWLW P YLF YVQ. 
MHWGTLCGFLWLWPYLFYVQ. 
MCWRPLCRFLWLWSYLSYVQ. 
MRCGPLYRFLWLWPYbSYV: 
MCWRPLCR IjWSYL.VYVQ, 

M YWRTLWG F LWLW P YL F Y I 

MRCGPLCRFLWLWPYLSYVK 
MRCGPLCRFLWLWPYLSYVR 



MCWRPLCR LWS YLVYVQ. 



m L YL-Y--- 
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110 



120 



SEQ ID N0:1 
SEQ ID NO: 17 
AAC50730 Macaca mulatta 
AAB51033 Ovis aries 
AAB05923 Sus scrofa 
AAB06579 Bos taurus 
AAC48641 Sus scrofa 
AAB17091 Gorilla gorilla 
AAB17 092 Pongo pygmaeus 
AAB41786 Ovis aries 
CAA72197 Bos taurus 
BAA19750 Bos taurus 
AAB53654 Canis familiaris 
AAB54023 Pan troglodyte 
153166 Homo sapiens 
LTHU Homo sapiens 
LTMS Mus musculus 
AAB61244 Bos taurus 
AAC60368 Gallus gallus 
Q95189 Gorilla gorilla 
Q28504 Macaca mulatta 
Q952 34 Pongo pygmaeus 
AAB97308 Sus scrofa 
AAC06303 Sus scrofa 
002750 Pan troglodytes 
Q28603 Ovis aries 
AAC32380 Gallus gallus 
AAC32381 Meleagris gallop 
1AX8 Homo sapiens 
Consensus 

SEQ ID NO: 18 




130 



140 



150 



'LEgDiSLGGVLEASgYSTEWALSRL 
SLEttUg^LGGVLEASgYSTEWALSRL 
!LE|L|sLGj5VLEAS JySTEWALSRIJ 
LE|L|SLgRvLEAS lYSTEW^m 
k^E^^SLGflVLEAS JySTEWALSRL 
|LEp|sLGGVLEAS iYSTEWALSRL 
£LE|L^SLGGVLEASgYSTEWALSRL 
nLEn&LGGVLEASSYSTEWALSRL 

!le o l|slggvleasSystewalsrl 

ji^S^SLfgGVLEASgYSTEWALSRL 
iLE g lJsLGSvLEASgYSTEWALSRL 
Lgg^ SLgGVLEASffYSTEWALSRL 
LE|L|SLGGVLEASgYSTEWALSRL 
LE|LisLGjgVLEASgYSTEWALSRL 
LE|Ug2LGGVLEASSYSTEWALSRL 
LE|LlsLGGVLEASgYSTEWALSRL 
LEglJsLGGVLEASgYSTEVVALSRL 
LEUUlSLGGVLEASEYSTEWALSRL 
LE| iJsLGBVLEASgYSTEWALSRL 
Lgj gSUjGVLEASffYSTEWALSRL 
SLgGVLEAmYSTEWALSRL 
LEMDJ SLGGVLEAsSySTEWALSRL' 



MSg^ SLjgGVLEAmYSTEWALSRL 
LEaU" SLGGVLEASSYSTEWALSRL: 
LEllJisLGGVLEAsSYSTEVVALSRL 



149 

150 

150 

117 

150 

129 

96 

129 

129 

129 

100 

129 

129 

129 

149 

150 

150 

150 

146 

129 

150 

129 

150 

150 

129 

129 

146 

128 

129 

125 



160 

■ . . . I ■ . . . j ■ . ■ . I . . 

SEQ ID N0:1 Mefetl hlilTlll c jtevggl EH 166 

SEQ ID NO: 17 ^^KSliW^QjS^ 167 

AAC50730 Macaca mulatta jSBBSBIIjB^^ 167 

AAB51033 Ovis aries 117 

AAB05923 Sus scrofa jW ^llp |g S|l R Mit>liMJ«BI 167 

AAB06579 Bos taurus fiSHilHW R TSBBH^ 146 

AAC48641 Sus scrofa 96 

AAB17091 Gorilla gorilla l^eUM^JlMMM^ 146 
AAB17092 Pongo pygmaeus 

CAA7 2197 Bos taurus lOO 

BAA19750 Bos taurus [•J«ktjM»liilR i*iltlfetJefJ 146 

AAB53654 Canis familiaris WASBBS W R ^tS^ 146 

AAB54 023 Pan troglodyte j^HBSSjlfy ^WlBISB 146 

153166 Homo sapiens EKqaifflM w MwBwfatH 166 

LTHU Homo sapiens ^^^^Sjlw^^B^fl 167 

LTMS Mus musculus BBCTRmH lB omBw ^EB 167 
AAB61244 Bos taurus 167 

AAC60368 Gallus gallus bS^BH CT qM^M 163 

Q95189 Gorilla gorilla ^^MMpSfflS^fl 146 

Q28504 Macaca mulatta ^^^SBt'S^SS ^^"^ 

Q95234 Pongo pygmaeus [^BBjaSl gW MBiBBBcB 146 

AAB97308 Sus scrofa j^BSflgRMMSBIeB 167 

AAC06303 Sus scrofa ^^^^fflp!^Q§^S 167 

002750 Pan troglodytes ^^^Sflw^QQ^ 14 6 

Q28603 Ovis aries ^^^K^QQ^ 146 

AAC32380 Gallus gallus ^SraigQSjmSjS 163 

AAC32381 Meleagris gallop {SBb bB I ff oBTffi SHf f:B 145 

1AX8 Homo sapiens j^aBawSffw wWMaeBea 146 

Consensus OBfl^SM-BBBEBeii 141 
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Pronunciation Key 

\&\ as a and u in abut 

\^\ as e in kitten 

\&r\ as ur/er in further 

\a\ as a in ash 

\A\ as a in ace 

\3\ as o In mop 

\au\ as ou in out 

\ch\ as ch In chin 

\e\ as e in bet 

\E\ as ea in easy 

\g\ as g in go 

\i\ as i In hit 

\l\ as i in ice 
\j\ as j in job 
\[ng]\ as ng in sing 
\0\ as o in go 
\o\ as aw in law 
\oi\ as oy in boy 
\th\ as th in thin 
\[th]\ as th in the 
\u\ as oo in loot 
\u\ as oo in foot 
\y\ as y in yet 
\zh\ as si in vision 



Main Entry; ho.mo.logue 
Function: noun 
Date: 1848 

Varlant(s): or ho.nno.log /'hO-m&-"log, 'h@~, -"l@g/ 

: something (as a chemical compound or a chromosome) homologous 
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Main Entry: ho.mol.o.gous 
Pronunciation: hO-'ma-l& g&s, h&- 
Fu notion: adjective 

Etymology: Greek homologos agreeing, from hom- + legein to sav -- 
more at LEGEND ' 
Date: 1660 

1 a : having the same relative position, value, or structure; as (1) ; 
exhibiting biological homology (2) : having the same or allelic genes with 
genetic loci usually arranged in the. same order b : belonging to or 
consisting of a chemical series whose successive members have a 
regular difference in composition especially of one methylene group 

2 : derived from or developed in response to organisms of the same 
species 
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an.a.log 



an.a.logue[1,noun] 
analog com puter 



Main Entry: an.a.log 
Pronunciation: 'a-n['^&]l-"og, -"ag 
Function: adjective 
Date: 1948 

1 : of, relating to, or being an analogue 

2 a : of. relating to, or being a mechanism in which data is represented 
by continuously variable physical quantities b : of or relating to an analog 
computer c : being a timepiece having hour and minute hands 

Main Entry: ''an.a.logue 
Function: noun 

Etymology: French analogue, from analogue analogous, from Greek 

analogos 

Date: 1826 

Variant(s): or an.a.log /'a-nf'^&JI-'og, -"@g/ 

1 : something that is analogous or similar to something else 

2 : an organ similar in function to an organ of another animal or plant but 
different in structure and origin 

3 usually analog : a chemical compound that is structurally similar to 
another but differs slightly in composition (as in the replacement of one 
atom by an atom of a different element or in the presence of a particular 
functional group) 

4 : a food product made by combining a less expensive food (as 
soybeans or whitefish) with additives to give the appearance and taste of 
a more expensive food (as beef or crab) 

Main Enti7: analog computer 
Function: noun 
Date: 1948 

: a computer that operates with numbers represented by directly 
measurable quantities (as voltages or rotations) - compare DIGITAL 
COMPUTER . HYBRID COMPUTER 
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de.riv.a.tive[1,noun] 



denvative[2, adjective] 
partial derivative 



Main Entry; ^de.riv.a.tive 
Pronunciation: di-'ri-v&-tiv 
Function: noun 
Date: 15th century 

1 : a word formed by derivation 

2 : something derived 

3 : the limit of the ratio of the change in a function to the corresponding 
change in its independent variable as the latter change approaches zero 

4 a : a chemical substance related structurally to another substance and 
theoretically derivable from it b : a substance that can be made from 
another substance 

Main Entry: ^derivative 
Function: adjective 
Date: circa 1530 

1 : formed by derivation 

2 : made up of or marked by derived elements 

3 : lacking originality ; BANAL 

- de.riv.a.tive.ly adverb 

- de.riv.a.tive.ness noun 

Main Entry: partial derivative 
Function: noun 
Date: 1889 

: the derivative of a function of several variables with respect to one of 
them and with the remaining variables treated as constants 
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iA. General Method Applicable to the Search for Similarities 
in the Amino Acid Sequence of Two Proteins 

Saul B. Nebdlemax and Cheistlan" D. WtrNSCH 

Department of Biochemistry, Northwestern University, and 
Nuclear Medicine Service, V. A. Research Hospital 
Chicago, III. 60611, U.S.A. 

{Received 21 July 1969) 

A computer adaptable method for finding similarities in the amino acid sequences 
of two proteins has been developed. From these findings it is possible to determine 
whether significant homology exists between the proteins. This information is 
used to trace their possible evolutionary development. 

The maximum match is a number dependent upon the similarity of the 
sequences. One of its definitions is the largest number of amino acids of one protein 
that can be matched with those of a second protein allowing for all possible 
interruptions in either of the sequences. While the interruptions give rise to a 
very large number of comparisons, the method efiiciently excludes from consi- 
deration those comparisons that cannot contribute to the maximiun match. 

Comparisons are made from the smallest imit of significance, a pair of amino 
acids, one from each protein. All possible pairs are represented by a two-dimen- 
sional array, and all possible comparisons are represented by pathwaj^ through 
the array. For this maximum match only certain of the possible pathways must be 
evaluated. A numerical value, one in this case, is assigned to every cell in the 
array representing like amino acids. The maximiun match is the largest number 
that would result from summing the cell values of every pathway. 

1. Introduction 

The amino acid sequences of a number of proteins have been compared to determine 
whether the relationships existing between them could have occurred by chance. 
Geaerally, these sequences are from proteins having closely related functions and are 
80 similar that simple visual comparisons can reveal sequence coincidence. Because 
the method of casual comparison is tedious and because the determination of the 
Significance 'of a given result usually is left to intuitive rationalization, computer- 
based statistical approaches have been proposed (Fitch, 1966; Needleman & Blair, 
1969). 

; ^Di^ect comparison of two sequences, based on the presence in both of corresponding 
^niino acids in an identical array, is insufficient to establish the full genetic relation- 
5^p3 between the two proteins. Allowance for gaps (Braunitzer, 1965) greatly 
^Dttltiplies the number of comparisons that can be made but introduces unnecessary 
Rnd partial comparisons. 

^ 2. A General Method for Sequence Comparison 

^xhe smallest unit of comparison is a pair of amino acids, one from each protein. The 
-7*^inaum match can be defined as the largest number of amino acids of one protein that 
Q be matched with those of another protein while allowing for all possible deletions. 
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The maximum match can be determined by representing in a two-dhnensional arS 
aU possible pair combinations that can be constructed from the amino acid sequerS 
of the proteins, A and B, being compared. If the amino acids are numbered from'!: 
N-terminal end, Aj is the jth amino acid of protein A and Bi is the ith amino ^c^oii 
protem B. The Aj represent the columns and the Bi the rows of the two-dimensioffl 
array. MAT. Then the cell, MATij, represents a pair combination that contains Aif~^ 
Bi, •'^ 

Every possible comparison can now be represented by pathways tlirough the a^™ 
An t or j can occur only once in a pathway because a particular amino acid'^daS^ 
occupy more than one position at one time. Furthermore, if MATmn is part of apafff 
including MATi/, the only permissible relationships of their indices are m > 
OTm<i,n < j. Any other relationships represent permutations of one or both|aa. 
acid sequences which cannot be allowed since this destroys the significance?^ 
sequence. Then any pathway can be represented by MATofi . . . IVIATj/s, where of 
6 > 1, the t and j of all subsequent cells of IMAT are larger than the running indi^ 
of the previous cell and 3/ < iT, z ^ M, the total number of amino acids comprSiu^ 
the sequences of proteins A and B, respectively. A pathway is signified by a linS 
connecting ceUs of the array. Complete diagonals of the array contain no gaps. WhlT 
MATij and MATmn are part of a pathway, i-m^j-niao, sufficient, but'nS. 
necessary condition for a gap to occur. A necessary pathway through MAT is defin^ 
as one which begins at a cell in the first column or the first row. Both * and j muS 
increase in value; either i or j must increase by only one but the other index mal 
increase by one or more. This leads to the next cell in a MAT pathway. This prpl 
cedure is repeated until either i or j, or both, equal theu- hmiting values, K and if^ 
respectively. Every partial or unnecessary pathway will be contamed in at least 011? 
necessary pathway. J 
In the simplest method, MATij is assigned the value, one, if is the same kind 
of amino acid as if they are dififerent ammo acids, MATij is assigned the valuej 
zero; The sophistication of the comparison is increased if, instead of zero or one, eaoS 
cell value is made a function of the composition of the proteins, the genetic code 
triplets representing the amino acids, the neighboring cells in the array, or any thepi^ 
concerned with the significance of a pair of amino acids. A penalty factor, a numlwr 
subtracted for every gap made, may be assessed as a barrier to allowing the gap; Tng 
penalty factof could be a function of the size and/or dkection of the gap. No gap 
would^b^ allowed in the operation unless the benefit from allowing that gap would' 
exceed the bamer. The maximum-match pathway then, is that pathway for whi^ 
the sum of the assigned cell values (less any penalty factors) is largest. JIAT can 
broken up into subsections operated upon independently. The method also cani 
expanded to allow simultaneous comparison of several proteins using the amino 
sequences of n proteins to generate an ri-dimensional array whose cells represen^ 
possible combinations of n amino acids, one from each protein. '3 
The maximum-match pathway can be obtained by beginning at the termine^ 
the sequences (i = j = 2) and proceeding toward the origins, first by adding-'S^ 
value of each cell possessing indices i = y — I and/or j ~ z — 1, the mw " 
value from among all the cells which He on a pathway to it. The process is rcpM?^ 
indices t = y — 2 and/or j = z ~ 2, This increment in the indices is continu^ 
all cells in the matrix have been operated upon. Each cell in this outer row or " 
will contain the maxunum number of matches that can be obtained by ori 
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SIMILARITIES IN %MINO ACID SEQUENCE 445 
any pathway at that cell and the largest'number in that row or column is equal to the 
maximum match; the maximum-match pathway in any. row or column must begin 
at this number. The operation of successive summations of cell values is illustrated in 
Figures 1 and 2. 

A BCNJROCLCRPM 



A 


1 


























J 










1 


















C 






1 










I 




I 








J 










1 


















N 








1 




















R 












1 


4 


3 


3 


2 


2 


0 


0 


C 


3 


3 


4 


3 


3 


3 


3 


4 


3 


3 


1 


0 


0 


K 


3 


3 


3 


3 


3 


3 


3 


3 


3 


2 


t 


0 


0 


C 


2 


2 


3 


2 


2 


2 


2 


3 


2 


3 


1 


0 


0 


R 


2 


1 


1 


1 


1 


2 


1 


1 


1 


t 


2 


0 


0 


B 


1 


2 


1 


1 


1 


1 


1 


1 


1 


1 


1 


0 


0 


P 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 



Fig. 1. The maximum-match operation for necessary pathways. 
The number contained in each cell of the array is the largest nmnber of identical pairs that can 
be found if that cell is the origin for a pathway which proceeds with increases in running indices. 
Identical pairs of amino acids were given the value of one. Blank cells which represent non-identical 
pau:s have the value, zero. The operation of successive summations was begun at the last row of the 
array and proceeded row-by-row towards the first row. The operation has been partially completed 
in the R row. The enclosed cell in this row is the site of the cell operation which consists of a search 
along the subrow and subcolumn indicated by borders for the largest value, 4 in subrow C. This 
value is added to the cell from which the search began. 





f'lo. 2. Contributors to the moximijin match in the completed array. 
The alternative pathways that could form the maximum match aro illustrated. The maximum 
"l^tch terminates at tho largest niunber in tho^first row or first" column, 8 in this case. 
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It is apparent that the above array operation can begin at any of a numtiS 
points along the borders of the array, which is equivalent to a comparison of N-teraSI 
nal residues or C-terminal residues only. As long as the appropriate rules for pathiPsSj 
are followed, the maximum match will be the same. The cells of the array wSS 
contributed to the maximum match, may be determined by recording the orip^T 
the number that was added to each cell when the array was operated upon. 

3. Evaluating the Significance of the Maximum Match 

A given maximum match may represent the maximum number of amino add 
matched, or it may just be a number that is a complex function of the relatio^hji 
between sequences. It wtU, however, always be a function of both the amino| 
compositions of the proteins and the relationship between their sequences. Onf^S 
ask whether a particular result found differs significantly from a fortuitous ^m at " 
between two random sequences. Ideally ,one would prefer to know the exact prbba^ 
ity of obtaining the result found from a pair of random sequences and what fraSM 
of the total possibihties are less probable, but that is prohibitively difficult, especia" 
if a complex function were used for assigning a value to the cells. " 

As an alternative to determining the exact probabiHties, it is possible to estima'^ 
the probabihties experimentally. To accomplish the estimate one can construct' t-^ 
sets of random sequences, a set from the amino acid composition of each of the prg 
teina compared. Pah^ of random sequences can then be formed by randomly drawii^^ 
one member from each set. Determining the maximum match for each pak seleotedj 
will yield a set of random values. If the value found for the real proteins is significanti|| 
different from the values found for the random sequences, the difference is a function 
of the sequences alone and not of the compositions. Alternatively, one can constni^ 
random sequences from only one of the proteins and compare them with the other^wg 
determine a set of random values. The two procedures measure different probabiHtie^ 
The first procedure determines whether a significant relationship exists between -^im 
real sequences. The second procedure determines whether the relationship of th^ 
protein used to form the random sequences to the other proteins is significant. It bear^ 
reiterating that the integral amino acid composition of each random sequence ^^f^ 
be equal to that of the protein it represents. ^ 

The amino acid^sequence of each protein compared belongs to a set of sequeno^ 
which are pejmutations. Sequences drawn randomly from one or both of t^^^se B^tej 
are used t6 establish a distribution of random maximum-match values which WQ^ 
include all possible values if enough comparisons were made. The null hypothesis, t^ 
any sequence relationship manifested by the two protems is a random one, is 
If the distribution of random values indicates a small probabiUty that a maxinmjg 
match equal to, or greater than, that found for the two protems could be drawn fr^ 
the random set, the hypothesis is rejected. 

4. CeU Values and Weighting Factors 

To provide a theoretical framework for experiments, amino acid pairs ina^ 
classified into two broad tj-pes, identical and non-identical pairs. From 20 
amino acids one can construct 180 possible non-identical pairs. Of these, 76 
amino acids have codons (Marshall, Caskey & Nirenberg, 1967) whose bases ^ 
only one position (Eck & Dayhoff, 1966). Each change is presumably the resug 
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single-point mutation. The majority" of non-identical pairs have a maximum of only 
one or zero corresponding bases. Due to the degeneracy of the genetic code, pair 
differences representing amino acids with no possible corresponding bases are uncom- 
mon even in randomly selected pairs. If cells are weighted in accordance with the 
maximum number of corresponding bases in codons of the represented amino acids, 
the maximum match will be a function of identical and non-identical pairs. For com- 
parisons in general, the cell weights can be chosen on any basis. 

If every possible sequence gap is allowed in forming the maximum match, the 
significance of the maximum match is enhanced by decreasing the weight of those 
pathways containing a large number of gaps. A simple way to accompHsh this is to 
assign a penalty factor, a number which is subtracted from the maximum match for 
each gap used to form it. The penalty is assigned before the maximum match is formed. 
Thus the pathways will be weighted according to the number of gaps they contain, 
but the nature of the contributors to the maximum match will be affected as well. In 
proceeding from one cell to the next in a maximum -match pathway, it is necessary 
that the difference between each cell value and the penalty, be greater than the value 
for a cell in a pathway that contains no gap. If the value of the penalty were zero, all 
possible gaps could be allowed. If the value were equal to the theoretical value for 
the maximum match between two proteins, it would be impossible to allow a gap and 
the maximum match would be the largest of the values found by simply summing 
along the diagonals of the array; this is the simple frame- shift method. 

5. Application of the Method 

To illustrate the role of weighting factors in evaluating a maximum match, two 
proteins expected to show homology, whale myoglobin (Edmundson, 1965) and human 
jS-hemoglobin (Konigsberg, Goldstein & Kill, 1963), and two proteins not expected to 
exhibit homology, bovine pancreatic ribonuclease (Smyth, Stein & Moore, 1963) and 
hen's egg lysozyme (Canfield, 1963) were chosen for comparisons. 

The Fortran programs used in this study were written for the CDC3400 computer. 
The operations employed in forming the maximum match are those for the special 
case when none of the cells of the array have a value less than zero. Four types of 
amino acid pairs were distinguished and variable sets consisting of values to be 
assigned to each type of pair and a value for the penalty were estabUshed. The pair 
types are as follows: 

Type 3. Identidal pairs: those having a maximum of three corresponding bases in 
their codqusf 

Type 2. Pairs having a maximum of two corresponding bases in their codons. 
Type 1. Pairs having a maximum of one corresponding base in their codons. 
Type 0. Pairs having no possible corresponding base in their codons. 

The value for type 3 pairs was 1-0 and the value for type 0 pairs was zero for all 
variable sets. 

At program execution time, the amino acids (coded by two-digit numbers) of the 
sequences to be compared were read into the computer, and were followed by a 
twenty. by- twenty sjTnmetrical array, the maximum correspondence array, analogous 
,,to one used by Fitch (1966), that contained all possible pairs of amino acids and 
identified each pair as to type.'The RNA codons for amino acids used to construct 
the maximum- correspondence array were taken from a single Table (Marshall et al, 
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1967). The UGA, XJAA and UAG codons were not used, but UUG was useirt 
codon for leucine. The subsequent data cards indicated the numerical values^ 
variable set. ■ '"Wt 

The two-dimensional comparison array was generated row-by-row. The ami^^ 
code numbers for Ai and Bj referenced the correspondence array to detennCT 
type of amino acid pair constituted by Ai and J3j. The type number referencet^gj 
array, the variable set, containing the type values, and the appropriate value^fi 
that set was assigned to the appropriate cell of the comparison array. The maxinii 
match was then determined by the procedure of successive summations,. 

Follo-vving the determination of the maximum match for the real proteins, the^a'a 
acid sequence of only one member of the protein pair was randomized and -tE^ffl 
was repeated. The sequences of j9-hemoglobin and ribonuclease were the oneir^S 
ized. The randomization procedure was a sequence shuffling routine based os^ 
puter-generated random numbers. A cycle of sequence randomization-ma^dmu 
match determination was repeated ten times in all of the experiments in thi8-r6| 
giving the random values used for comparison ^vith the real maximum-mate^^ 
average and standard deviation for the random values of each variable ee^S 
estimated. 

6. Results and Discussion 

The use of a small random sample size (ten) was necessary to hold the compu^ 
time to a reasonable level. The maximum probable error in a standard devia^^ 
estimate for a sample this smaU is quite large and the results should be judged'\^ 
this fact in mind. For each set of variables, it was assumed that the random ya^ 
would be distributed in the fashion of the normal-error curve; therefore, the vaiiie|j 
the first six random sets in the jS-hemoglobin-myoglobin comparison were conyerU 
to standard measure, five was added to the result, and these values were plotted^ 
one group against their calculated probit. The results of the plot are shown in Figiire||^ 
The fit is good indicating the probable adequacy of the measured standard deviati^ 
for these variable sets in estimating distribution fimctions for random values throug^ 
two standard deviations. The above fit indicates no bias in the randomization pro^ 
cedure. In other words, randomization of the sequence was complete beforej^ 
maximum match was determined for any sequence in a random set. 

The results obtained in the comparison of jS-hemoglobin with myoglobin/^ 
summarized in Table 1 and the results for the ribonuclease-lysozyme comparison^a^ 
in Table 2. These Tables indicate the values assigned to the pair types, the P^^g^ 
factor used in forming each of the maximum matches, and the statistical re^a 
obtained. The number of gaps roughly characterizes the nature of the pathway Igf" 
formed the maximum match. A large number is indicative of a devious pathi^ 
through the array. One gap means that all of the pathway may be found on 
partial diagonals of the array. 

The most important information is obtained from the standardized value ot^^ 
maximum match for the real proteins, the difference from the mean in st^| 
deviation units. For this sample size all deviations greater than 3 0 were assu^ 
include less than 1% of the true random population and to ^^^^^^^"^ 
ficant difference.. As might be expected, all matches of myoglobin and ^-Kemg 
show a significant deviation. Among the sets of variables, set 1, which result 
search for identical amino acid pairs while allowing for all deletions, indic^te^ 



rscH 

at TJUG was used as :3 
numerical values formal 

by-row. The amino acidl 
array to determine t^ 
mber referenced a short|i||BS^ 
uppropriate value froor ^ 
n array. The maximuinM 
ummations. ^ 
real proteins, the aminol^ 
domized and the match^ 
; were the ones randbmM^ 
routine based on cpM 
domization-maximu]^ 
eriments in this report 
maximum-match. tKI5 
each variable set, wc^l 



SIMILARITIES IN AMINO ACID SEQUENCE 



449 



^ to hold the computer 
1 a standard deviation^ 
should be judged withw^^ 
hat the random values v"**^ 
therefore, the values ofi 
:)arison were converted 
values were plotted as^' 
J are shown in Figure 3); 
ed standard deviations 
•andom values through^S 
he randomization pro?, 
3 complete before the! 
im set. 

1 with myoglobin, are 
^ozyme comparison are; 
pair types, the penalty! 

the statistical resultsp^^" 
re of the pathway th^ 
of a devious pathw!^ 
>r be found on only.ft^ 

idardized value ot imi 
he mean in stand^Si 
n 3-0 were assumedj^ 
I to indicate a; 
:>bin and g-he mogl g)j 
t 1, which restilt 
dons, indicates^tfi^ 




Variable in standardized measure 
Th...v,AV ■ r .^^'IV^' P^°* grouped random samples. 



Table 1 

^•Hemoglobin-niyoglobin maximum matches 



Match values 

Variable 



Maximum-match 
pair types - Penalty value sura 

2 1 I^eal Random t 



Real ^Itnuiium deletions 
Real Random f 



0 

0 
0-67 
0-67 
0-2o 
0-2o 
0-25 



0 

0 
0-33 
0-33 
OOo 
005 
005 



0 

1-00 
0 

103 

0 
105 
25 



6300 
38-00 
97 00 
89-63 
71-55 
ol95 
47-30 



55-60 
27-SO 
91-47 
80-25 
04-78 
40-54 
33-SO 



1- 80 

2- 09 
1*55 
Ml 
1-59 
1-46 
1-52 



4-11 

4-88 

3- 57 
8-46 

4- 27 

7- 80 

8- 87 



35 
4 

IS 
1 

46 
3 
0 



36-2 

5-5 
24-3 

3-6 
45-0 

7-5 

0 



J»» each variable sot. ^ ^"""^ ^^^^ ^ I'O «nd 0, respectively, 

g t An average value from 10 samples. 
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Table 2 

Ribonudease-lysozyme maximum matches 



Variable 
set 



Match values 
for 
pair types 



Penalty 



Maximum- match 
value sum 

Real Randomf 



j^^^j Minimum dolef[ 
Real Rand 



1 


0 


0 


0 


48-00 


44-20 


2-56 


1-48 


34 


2 


0 


0 


1-00 


23-00 


22 00 


1-73 


0-58 


5 


3 


0-67 


0-33 


0 


78-33 


76-17 


0-82 


2-64 


21 


4 


0-67 


0-33 


1-03 


G7-03 


67-37 


1-27 


0-43 


2 


5 


0-25 


0-05 


0 


5600 


52-26 


212 


1 77 


35 


6 


0-25 


0-05 


1-05 


33-70 


33 02 


1-66 


0-41 


8 


7 


0-25 


005 


25 


28-15 


27-67 


1-75 


0-22 


0 



3 is the estimated standard deviation; X, the standardized value, (real-random)/5,^of^ 
maximum match of the real proteins. The values for type 3 and type 0 pairs were 1-0 and, 
respectively in each variable set. 

t An average value from 10 samples. 

amino acids in j8- hemoglobin and myoglobin can be matched. To attain this matoK, 
however, it is necessary to permit at least 35 gaps. In contrast, when two gaps; aw 
allowed according to Braunitzer (1965), it is possible to match only 37 of the amino 
acids. Curiously, when this variable set was used for comparing human myoglotifi 
(Hill, personal communication) with human )8- hemoglobin, the maximum matoK 
obtained was not significant. Differences between real and random values were higUJ 
significant, however, when other variable sets were used. 

Variable set 2 attaches a penalty equal to the value of one identical amino acid pag 
to the search for identical amino acid pairs. This penalty will exclude from consider^ 
tion any possible pathway that leaves and returns to a principal diagonal, therebjj 
needing two gaps, in order to add only one or two amino acids to the maximum maten, 
This set results in a total of 30 + 4 = 42 amino acids matched (the maximum-mai^K 
value plus the number of gaps is reduced to four) and the significance of the resul^ 
relative to se^ 1 appears to be increased. Braunitzer's comparison would, have a value 
of 37 — =35 using this variable set, hence it was not selected by the method.*-^ 

Variable sets 3 and 4 have an interesting property. Their maximum-match 
can be related to the minimum nmnber of mutations needed to convert the seleo^ 
parts of one amino acid sequence into the selected parts of the other. The minimgS 
number of mutations concept in protein comparisons was first advanced by^ 
(1966). If the type values for these sets are multiplied by three, they become equp»t9 
their pair t3rpe and directly represent the maximum number of corresponding^^ 
in the codona for a given amino acid pair. Thus the maximum match and 
factors may be multiplied by three, making it possible to calculate the max^ 
number of bases matched in the combination of amino acid pairs selected ^-bj 
maximum-match operation. 

^-Hemoglobin, the smaller of the two proteins, contains 146 amino acids] ' 
quently the highest possible maximum match (disregarding integral ainin 
composition data) with myoglobin is 146 X 3 = 438. Insufiicient data are;a^ 
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to analyze the result from set 3 on the basis of mutations. If it is assumed that the gap 
in set 4 does not exclude any part of j3-hemoglobin from the/pomparison, this set has a 
maximum of 3(89-63 + 1-03) ^ 272 bases matched /indicating a minimum of 438 — 
272 = 166 point mutations in this combination. Using this variable set and placing 
gaps according to Braunitzer, a score of 88-6 was obtained, thus his match was not 
selected. Again it may be observed that the penalty greatly enhanced the significance 
of the maximum match. 

Variable sets 5 and 6 have no intrinsic meaning and were chosen because the weight 
attached to type 2 and type 1 pairs is intermediate in value mth respect to sets 1 and 2 
and sets 3 and 4. The maximum match for set 6 is seen to have a highly significant 

- ^^The data of set 7 are results that would be obtained from using the frame-shift 
method to select a maximum match; the penalty was large enough to prevent any 
gaps in the comparisons. The slight differences in significance found among the maxi- 
mum-match values of jS-hemoglobin and myoglobin resulting from use of sets 4, 6 and 7 
are probably meaningless due to small sample size and errors introduced by the 
assumptions about the distribution functions of random values. Finding a value in set 
7 that is approximately equal to those from sets 4 and 6 in significance is not surprising. 
A larger penalty factor would have increased the difference from the mean in sets 4 
and 6 because almost every random value in each set was the result of more gaps than 
were required to form the real maximum match. Further, the gaps that are aUowed are 
at the N-terminal ends so that about 85% of the comparison can be made \vithout 
gaps. If an actual gap were present near the middle of one of the sequences, it would 
have caused a sharp reduction in the significance of the frame-shift type of match. 

Set 3 is the only variable set in Table 2 that shows a possible difference. Assuming 
the value is accurate, other than chance, there is no simple explanation for the 
difference. A smaU but meaningful difference in any comparison could represent 
evolutionary divergence or convergence. It is generally accepted that the primary 
structure of proteins is the chief determinant of the tertiary structure. Because 
certain features of tertiary structure are common to protems. it is reasonable to 
suppose that proteins wUl exhibit similarities in their sequences, and that these simi- 
larities will be sufficient to cause a significant difference between most protein pairs 
and their corresponding randomized sequences, being an example of submolecular 
evolutionary convergence. Further, the interactions of the protein backbone, side 
chains, and the solvent that determine tertiary structure are, in large measure, forces 
arising from the polarity and steric nature of the protein side-chains. There are 
conspicuous correlations in the polarity and steric nature of type 2 pairs. Heavy 
. .weighting of these pairs woiild be expected to enhance the significance of real maxi- 
mum-match values if common structural features are present in proteins that are 
compared. The presence of sequence similarities does not always imply common 
ancestry in proteins. More experimentation will be required before a choice among 
the possibiUties suggested for the result from set 3 can be made. If several short 
sequences of amino acids are common to aU proteins, it seems remarkable that the 
relationship of ribonuclease to lysozyme in'six of the seven variable sets appears to be 
truly a random one. It/should be noted, however, that the standard value of the real 
maximum-match is positive" in each variable set in this comparison. 

This method was designed for the purpose'of detecting homology and defining its 
nature when it can be shown to exist. Its usefuhiess for .the above purposes depends in 
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part upon assumptions related to the genetic events that could have occurred iSl 
evolution of proteins. Starting with the assumption that homologous proteins 
result of gene dupMcation and subsequent mutations, it is possible to cqiwK 
several hypothetical amino-acid sequences that would be expected to show hoiSSlS 
If one assumes that following the duplication, point mutations occur at a c^H 
or variable rate, but randomly, along the genes of the two proteins, after a r^to 
short period of time the protein pairs mil have nearly identical sequences. 
of the high degree of homology present can be accomplished by several mea^ 
use of values for non-identical pairs will do little to improve the significance,^ 
results. If no, or very few, deletions (insertions) have occurred, one could. expe 
enhance the significance of the match by assigning a relatively high penalty io3 
Later on in time the h\T)othetical proteins may have a sizable fraction of their^ 
changed by point mutations, the result being that an attempt to increase tKe^ 
cance of the maximum match will probably require attaching substantial-we^ 
those pairs representing amino acids still having two of the three originaMJw* 
their codons. Further, if a few more gaps have occurred, the penalty should be redu 
to a small enough value to allow areas of homology to be hnked to one another ^g ^ 
still later date in time more emphasis must be placed on non-identical pairsf^g 
perhaps a very small or even negative penalty factor must be assessed. EventuaUyfflj 
will be impossible to detect the remaining homology in the hypothetical exampl^^ 
using the approach detailed here. ^B, 

From consideration of this simple model of protein evolution one may deduce 
the variables which maximize the significance of the difference between real and random 
proteins gives an indication of the nature of the homology. In the compariflon^ 
human jS-hemoglobin to whale myoglobin, the assignment of some weight to typojl 
pairs considerably enhances the significance of the result, indicating substantia' 
evolutionary divergence. Further, few deletions (additions) have apparently occuCT^ 

It is known that the evolutionary divergence manifested by cytochrome (MargoliaagJ 
Needleman & Stewart, 1963) and other heme proteins (Zuckerkandl & Pauling, 196^ 
did not follow the sample model outlined above. Their divergence is the result^j*' 
non-random mutations along the genes. The degree and type of homology canl^ 
expected to differ between protein pairs. As a consequence of the difference t^^r^ 
no a priori best set of cell and operation values for maximizing the significance ojOT 
maximum-match yalue of homologous proteins, and as a corollary to this fact, .thgj 
is no best set^of values for the purpose of detecting only shght homology. This ifl^ 
important consideration, because whether the sequence relationship between prot|jg 
is significant depends solely upon the cell and operation values chosen. If it is f^g^ 
that the divergence of proteins follows one or two simple models, it may be 
to derive a set of values that will be most useful in detecting and defining homolo 

The most common method for determining the degree of homology between pgj 
pairs has been to count the number of non-identical pairs (amino acid replacemg^ 
in the homologous comparison and to use this number as a measure of evoluta^ 
distance between the amino acid sequences. A second, more recent concept hBB^ 
. to count the minimum, number of mutations represented by the non-identic^ 
This number is probably a more adequate measure of evolutionary distance^**" 
utilizes more of the available information and theory to give some measui^ 
number of genetic events that have occurred in the evolution of the prot^ 
approach outlined in this paper suppUes either of these numbers. 
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Main Entry: mam.mal 
Pronunciation: 'ma-m&l 
Function: noun 

Etymology: New Latin Mammalia, from Late Latin, neuter plural of 
mammalis of the breast, from Latin mamma breast 
Date: 1826 

; any of a class (Mammalia) of warm-blooded higher vertebrates (as 
placentals, marsupials, or monotremes) that nourish their young with 
milk secreted by mammary glands, have the skin usually more or less 
covered with hair, and include humans 
- mam.ma.ll.an /m&-'mA-IE-&n, ma-/ adjective or noun 
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Pronunciation Key 

\&\ as a and u in abut 
\*\ as e In kitten 
\&r\ as ur/er in further 
\a\ as a in ash 
\A\ as a in ace 
\S\ as o in mop 
\au\ as ou in out 
\ch\ as oh in chin 
\e\ as e in bet 
\E\ as ea in easy 
\g\ as g In go 
\i\as i in hit 
\l\ as i in ice 
\j\ as j in job 
\[ng]\ as ng in sing 
\0\ as o in go 
\o\ as aw in law 
\oi\ as oy in boy 
\th\ as th in tliin 
\[th]\ as th in the 
\u\ as oo in loot 
\u\ as oo in foot 
\y\ as y in yet 
\zh\ as si in vision 



Main Entry; mu.rine 
Pronunciation: *nnyur-"ln 
Function: adjective 

Etymology: ultimately from Latin mur-, mus 
Date: 1607 

: of or relating to a murid genus (Mus) or Its subfamily (Murinae) which 
includes the common household rats and mice; also : of. relating to, or 
involving these rodents and especially the house mouse 
- murine noun 

Main Entry: murine typhus 
Function: noun 
Date: 1933 

: a mild febrile disease that is marked by headache and rash, is caused 
by a rickettsia (Rickettsia mooseri), is widespread in nature in rodents, 
and is transmitted to humans by a flea 
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Main Entry; ''hu.man 
Pronunciation: 'hyu-m&n, 'yu- 
Function: adjective 

Etymology: Middle English humain, from Middle French, from Latin 
humanus; akin to Latin homo human being - more at HOMAGE 
Date: 14th century 

1 : of, relating to, or characteristic of humans 

2 : consisting of humans 

3 a : having human form or attributes b : susceptible to or representative 
of the sympathies and frailties of human nature 

- hu.man.ness /-m&n-n&s/ noun 

Main Entry: ^human 
Function: noun 
Date: circa 1 533 

: a bipedal primate mammal (Homo sapiens) : MAN: broadly : any living 
or extinct member of the family (Hominidae) to which the primate 
belongs 

- hu.man. like /-m&n-"llk/ adjective 

Main Entry: human being 
Function: noun 
Date: 1795 
: HUMAN 



Main Entry: human ecology 
Function: noun 
Date: 1907 

1 : a branch of sociology dealing especially with the spatial and temporal 
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interrelationships between humans and their economic, social, and 
political organization 

2 : the ecology of human communities and populations especially as 
concerned with preservation of environmental quality (as of air or water) 
through proper application of conservation and civil engineering 
practices 

Main Entry: human engineering 
Function: noun 
Date: 1920 

1 : management of humans and their affairs especially in industry 

2 : ERGONOMICS 

Main Entry: human immunodeficiency virus 
Function: noun 
Date: 1986 
: HIV 

Main Entry: human nature 
Function: noun 
Date: 1668 

: the nature of humans; especially : the fundamental dispositions and 
traits of humans 

Main Entry: human relations 

Function: noun plural but usually singular in construction 
Date: 1946 

1 : a study of human problems arising from organizational and 
interpersonal relations (as in industry) 

2 : a course, study, or program designed to develop better interpersonal 
and intergroup adjustments 

Main Entry: human resources 
Function: noun plural 
Date: 1975 
: PERSONNEL la. 2 

Main Entry: human rights 
Function: noun plural 
Date: 1791 

: rights (as freedom from unlawful imprisonment, torture, and execution) 
regarded as belonging fundamentally to all persons 



A BACK TO TOP 



Home I The Dictionaries: Language | Specialty | Multi-lingual | Tr anslation | Other Indexes 
'n vms & more j Grammars | Lang ua g e Identifiers | Research | Library ) Fun & Games | More 
About vourDictionarv .com | YD Lookup Button j Store j Register | Contact 
©2001, yourDictionary.com Inc. All Rights Reserved. 



http://vAVw.yourdictionary.com/cgi-bin/mw.cgi 



1/29/2002 



yourDictionary.com 



Page 1 of 2 



your^i^^^ictionary.com' 

"T-^I^^Sw The Gtobat Languigie Resource 



O About I O Store | ©RegLster | Q Contact 



Professfonat Brand Naming 
For targe and small buslndsse:^ 

Professional Translating 

For mdivlduais and businesses 



The Dictionaries: CttiLanguage | L^i.SpjeciaJty. | Multi-lingual | ^^Tr anslation | ^Vj Oth er Inde xes 
0:nyjn s & more | Q Grammars | Q Languag e Identifiers | Q Research | Q Library | Q Pun & Games | ©More 




^Quick Lookup Dictionary Results 



DICTIONARY 



[ 



> LOOKUP 



THESAURUS 



^ LOOKUP 



► Add o ur Looku p 
Button to y our 
browser 



Pronunciation Key 

\&\ as a and u in abut 

as e in kitten 
\&r\ as ur/er in further 
\a\ as a in asli 
\A\ as a in ace 
\a\ as o in mop 
\au\ as ou in out 
\cli\ as ch in chin 
\e\ as e in bet 
\E\ as ea in easy 
\g\ as g In go 
\j\ as i in hit 
\l\ as i in ice 
\j\ as J in job 
\[ng]\ as ng in sing 
\0\ as o in go 
\o\ as aw in law 
\oi\ as oy in boy 
\th\ as th in thin 
\Ith]\ as th in the 
\u\ as oo in loot 
\u\ as oo in foot 
\y\ as y in yet 
\zh\ as si in vision 



MerriamAVebstei's 



COLLEOiATE* DICnONAI 



5 words found. 

To view an entry in the list, highlight it and click on GO TO. 



> THESAURUS 



fl I I 

m 



pa 

CD 



CO 



33 



CD 
CD 



13 

m 

O 

m 

< 
m 
o 



syn.thet. ic[1 .adjective] 



synthetic[2,noun] 
synthetic division 
synthetic geometry 
synthetic resin 



Main Entry: ""syn-thet-ic 
Pronunciation: sin-'the-tik 
Function: adjective 

Etymology: Greek synthetikos of composition, component, from 
syntithenai to put together 
Date: 1697 

1 : relating to or involving synthesis : not analytic 

2 ; attributing to a subject something determined by observation rather 
than analysis of the nature of the subject and not resulting in self- 
contradiction if negated - compare ANALYTIC 

3 : characterized by frequent and systematic use of inflected forms to 
express grammatical relationships 

4 a (1) : of, relating to, or produced by chemical or biochemical 
synthesis; especially : produced artificially (2) : of or relating to a synfuel 
b : devised, arranged, or fabricated for special situations to imitate or 
replace usual realities c ; FACTITIOUS. BOGUS 

- syn.thet.i.cal.ly /-ti-k(&-)IE/ adverb 

Main Entry: ^synthetic 
Function: noun 
Date: 1916 

: something resulting from synthesis rather than occurring naturally; 
especially : a product (as a drug or plastic) of chemical synthesis 

Main Entry: synthetic division ; 
Function: noun 
Date: 1904 

: a simplified method for dividing a pplynomial by another polynomial of 
the first degree by writing down only the coefficients of the several 
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powers of the variable and changing the sign of the constant term in the 
divisor so as to replace the usual subtractions by additions 

Main Entry: synthetic geonnetry 
Function: noun 
Date: 1889 

: elennentary euclidean geometry or projective geometry as distinguished 
from analytic geometry 

Main Entry: synthetic resin 
Function: noun 
Date: 1907 
: RESIN 2 
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